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We establish a link between the maximization of Kolmogorov Sinai entropy (KSE) and the mini¬ 
mization of the mixing time for general Markov chains. Since the maximisation of KSE is analytical 
and easier to compute in general than mixing time, this link provides a new faster method to ap¬ 
proximate the minimum mixing time dynamics. It could be interesting in computer sciences and 
statistical physics, for computations that use random walks on graphs that can be represented as 
Markov chains. 

PACS numbers: 


Many modern techniques of physics, such as compu¬ 
tation of path integrals, now rely on random walks on 
graphs that can be represented as Markov chains. Tech¬ 
niques to estimate the number of steps in the chain to 
reach the stationary distribution (the so-called “mixing 
time”), are of great importance in obtaining estimates 
of running times of such sampling algorithms [I] (for a 
review of existing techniques, see e.g. El)- On the other 
hand, studies of the link between the topology of the 
graph and the diffusion properties of the random walk on 
this graph are often based on the entropy rate, computed 
using the Kolmogorov-Sinai entropy (KSE) [3]. For exam¬ 
ple, one can investigate dynamics on a network maximiz¬ 
ing the KSE to study optimal diffusion [3], or obtain an 
algorithm to produce equiprobablc paths on non-regular 
graphs [3]. 

In this letter, we establish a link between these two 
notions by showing that for a system that can be repre¬ 
sented by Markov chains, a non trivial relation exists 
between the maximization of KSE and the mini¬ 
mization of the mixing time. Since KSE are easier to 
compute in general than mixing time, this link provides 
a new faster method to approximate the minimum mix¬ 
ing time that could be interesting in computer sciences 
and statistical physics and gives a physical meaning to 
the KSE. We first show that on average, the greater the 
KSE, the smaller the mixing time, and we correlated this 
result to its link with the transition matrix eigenvalues. 
Then, we show that the dynamics that maximises KSE is 
close to the one minimizing the mixing time, both in the 
sense of the optimal diffusion coefficient and the transi¬ 
tion matrix. 

Consider a network with m nodes, on which a parti¬ 
cle jumps randomly. This process can be described by 
a finite Markov chain defined by its adjacency matrix A 
and its transition matrix P. A(i,j ) = 1 if and only if 
there is a link between the nodes i and j and 0 other¬ 


wise. P = (pi ]) where pij is the probability for a particle 
in i to hop on the j node. Let us introduce the proba¬ 
bility density at time n p n = (p l n )i=i... m where p l n is the 
probability that a particle is at node i at time n. Start¬ 
ing with a probability density po, the evolution of the 
probability density writes: p n +i = P* Pn where P 4 is the 
transpose matrix of P. 

Within this paper, we assume that the Markov chain is 
irreducible and thus has a unique stationary state. 

Let us define: 

d(n) = max\\(P t ) n p - p sta t\\ V p, (1) 

where ||.|| is a norm on R n . For e > 0, the mixing time, 
which corresponds to the time such that the system is 
within a distance e from its stationary state is defined as 
follows: 

t(e) = minn, d(n) < e. (2) 

For a Markov chain the KSE takes the analytical form 

0 : 

h KS = ~ ^2 VstatiVij log (Pij)- (3) 

ij 

Random m size Markov matrices are generated by as¬ 
signing to each Pij (i j) a random number between 0 
and — and pa = l — V, , ■ p.,,. The mean KSE is plotted 
versus the mixing time (Fig. Ill) by working out has and 
t(e) for each random matrix. (Fig. [Tj) shows that KSE is 
on average a decreasing function of the mixing time. 

We stress the fact that this relation is only true on 
average. We can indeed find two special Markov chains 
PI and P2 such that /lics(Pl) < h,Ks(P2) and t\(e) < 
t 2 (e)- We illustrate this point further. 
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The link between the mixing time and the KSE can 
be understood via their dependence as a function of the 
transition matrix eigenvalues. A general irreducible tran¬ 
sition matrix P is not necessarily diagonalizable on R. 
However, since P is chosen randomly, it is almost every¬ 
where diagonalizable on C. According to Perron Frobe- 
nius theorem, the largest eigenvalue is 1 and the asso¬ 
ciated eigen-space is one-dimensional and equal to the 
vectorial space generated by /r sta t- Without loss of gen¬ 
erality, we can label the eigenvalues in decreasing order 
of their module: 


of the diffusion corresponding to this maximum. In the 
same spirit, one could compute an optimal diffusion coef¬ 
ficient with respect to the mixing time, corresponding to 
the value of the diffusion coefficient which minimizes the 
mixing time -or equivalently the smallest second largest 
eigenvalue A (P). This would roughly correspond to the 
diffusion model reaching the stationary time in the fastest 
time. To define such an optimal diffusion coefficient, we 
follow Gomez and Latora and vary the transition proba¬ 
bility depending on the degree of the graph nodes. More 
accurately, if ki = JT A(i,j) denotes the degree of node 
i, we set: 


1 = Ai > |A 2 | > .... > |A m | > 0 

The convergence speed toward /r s t a t is given by the 
second maximum module of the eigenvalues of P[5], 0 : 


A(P) = max | Aj| = |A 2 | 

The eigenvalues Ai = 1,..., A m of P and P 4 being 
equal, let us denote their associated eigenvectors = 
/^stat, For any initial probability density p, 0 , we 

find: 


||(P7>o-Mstat||oc(A(P)r. (4) 

According to Eqs. ([!]) and A(P) 4 W ex e, i.e. A(P) oc 

Hence, the smaller A(P) the shorter the mixing 
time (Fig. [l]). hxs being a decreasing function of t(e) 
and A(P) being an increasing function of f(e), we deduce 
that hxs is a decreasing function of A(P). 

This link between maximum KSE and minimum mix¬ 
ing time actually also extends naturally to optimal diffu¬ 
sion coefficients. Such a notion has been introduced by 
Gomez-Gardenes and Latora p in networks represented 
by a Markov chain depending on a diffusion coefficient. 
Based on the observation that in such networks, KSE 
has a maximum as a function of the diffusion coefficient, 
they define an optimal diffusion coefficient as the value 
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FIG. 1: Averaged KSE versus mixing time (top) for 10 6 ran¬ 
dom m = 10 size matrices and averaged A(P) versus mixing 
time (bottom) for 10 G random m = 10 size matrices in curve 
blue and /(t) = e 1 ' 4 in red. e = 10 -3 and the norm is chosen 
to be the euclidian one. 


Pij = 


Atjkf 

Yjj Aijk “ 


(5) 


If a < 0 we favor transitions towards low degrees 
nodes, if a = 0 we find the typical random walk on net¬ 
work and if a > 0 we favor transitions towards high de¬ 
grees nodes. We assume here that A is symmetric. It may 
then be checked that the stationary probability density 
is equal to: 


T^stati 


Cikf 




( 6 ) 


where c* = A^kf, 

Using Eqs. ([5]) and Q, we check that the transition 
matrix is reversible and then has m real eigenvalues. 
From this stationary probability density, we can thus 
compute both the KSE and the second largest eigenvalue 
A(P) as a function of a. The result is provided in (Fig. [2j) . 
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FIG. 2: KSE (top) and A(P) (bottom) function of a for a 
network of size m = 400 with a proportion of 0 in A equal to 
1/3. 

We observe in (Fig. |2j) that the KS entropy has a 
maximum at a value that we denote i n agreement 

with the findings of [3]. Likewise, A(P) (i.e. the mix¬ 
ing time) presents a minimum for a = a m i X . Moreover, 
axs and a m ix are close. This means that the two op¬ 
timal diffusion coefficients are close to each other. Fur¬ 
thermore, looking at the ends of the two curves, we can 
find two special Markov chains PI and P2 such that 
hxs(P 1) < h.Ks(P2) and ti(e) < f 2 (e), illustrating that 
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the link between KSE and the minimum mixing time is 
only true in a general statistical sense. 

We have thus shown that, for a given transition ma¬ 
trix P (or equivalently for given jump rules) the greater 
the KSE, the smaller the mixing time. We now investi¬ 
gate whether a similar property holds for dynamics, i.e. 
whether transition rules that maximise KSE are close to 
the ones minimizing the mixing time. For a given net¬ 
work, i.e. for a fixed A, there is a well known procedure to 
compute the transition matrix Pks which maximizes the 
KSE with the constraints A(i,j) = 0 => Pr-s(*, j) = 0 
[5] . It proceeds as follow: let us note A the greatest eigen¬ 
value of A and ^ the normalized eigenvector associated 
i.e AT = AT and = 1- We define Pks such that: 


P K s(i,j) 


A T ?: ' 


(7) 


We have Vi JU Pxsihj) = 1- Moreover, using the fact 
that A is symmetric we find: 


by diagonalizing A. Using Eqs. and |i~2| ) we find 
that Pks defined as in Eq. ([7]) maximises the KSE. 
Finally P K s verifies n stati PKs{i, j) = ^statjPKsijp) V 
(i, j) and thus Pks is reversible. 

In a similar way, we can search for a transition matrix 
Pmix which minimizes the mixing time -or, equivalently 
the transition matrix minimizing its second eigenvalue 
A(P). This problem is much more difficult to solve than 
the first one, given that the eigenvalues of P m i X can be 
complex. Nevertheless, two cases where the matrix P m i X 
is diagonalizable on R. can be solved [8]: the case where 
Pmix is symmetric and the case where P m ix is reversible 
for a given fixed stationary distribution. Let us first con¬ 
sider the case where P is symmetric. The minimisation 
problem takes the following form: 


min A (P) 

p<ss n 

P{i,j)>0,P*l = l ( 13 ) 

A(i,j) = 0 =>• P(i,j) = 0 


E o*? = E (8) 

3 3 

Hence, Pj^g^ 2 = and the stationary density of Pks 
is 7 Tstat = do¬ 
using Eqs. @ and ([T]), we have: 

hKS = 4 E log(^z) |i). (9) 

(id) j 

Eq. @ can be split in two terms: 
h KS = T^#>j) $ . $ j lo g( A) 

(*>j) 

- t ( 10 ) 

(*,j) 

The first term is equal to log(A) because T is an eigen¬ 
vector of A and the second term is equal to 0 due to the 
symmetry of A. Thus: 

hKS = log(A). (11) 

Moreover, for a Markov chain the number of trajecto¬ 
ries of length n is equal to N n = Xqi j)( An )(h j)- For a 
Markov chain the KSE can be seen as the time derivative 
of the path entropy leading that KSE is maximal when 
the paths are equiprobable. For an asymptotic long time 
the maximal KSE is: 

h K s^ = log(A), (12) 


given the strict convexity of A and the compactness of 
the stochastic matrices, this problem admits an unique 
solution. 

P is symmetric thus 1 is an eigenvector associated with 
the largest eigenvalue of P. Then the eigenvectors asso¬ 
ciated to A(P) are in the orthogonal of l.The orthogonal 
projection on l 1 writes: Id — yll ( 

Moreover, if we take the matrix norm associated with 
the euclidiean norm i.e. for M any matrix |||M||| = 
max x e K" X 7 ^ 0 it is equal to the square 

root of the largest eigenvalue of MM * and then if M is 
symmetric it is equal to A (M). 

Then the minimization problem can be rewritten: 

min |||(I d - ill *)P{I d ill‘)||| = |||P -- £11*111 

P(i,j ) > 0,P* 1 = 1 
A (i,j) = 0 => P(i,j) = 0 
(14) 

We solve this constrained optimization problem 
(Karush-Kuhn-Tucker) with Matlab and we denote P m i X 
the matrix which minimizes this system. 

We remark that the mixing time of Pks is smaller 
than the mixing time of P m ix- This is coherent because 
in order to calculate Pks we take the minimum on all the 
matrix space whereas to calculate P m i X we restrict us to 
the symmetric matrix space. Nevertheless, we can go 
a step further and calculate, the stationary distribution 
being fixed, the reversible matrix which minimizes the 
mixing time. If we note 7 r the stationary measure and 
II = diag(ir). Then P is reversible if and only if IIP = 
ITP. Then in particular rUPII - ! is symmetric and has 
the same eigenvalues as II. Finally, p = (y/ni ,is 
an eigenvector of IIsPI!”! associated to the eigenvalue 


n 
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1. Then the minimization problem can be written as the 
following system: 


f nhn|||(7 d -iqq^nsPn l (I d - ^qq*)||| 

J =|||nipn-*-iqq t ||| (15) 

| p{i,j) > o,p* i = i,hp = itp 

[ A(i,j) = 0 => P(i, j) = 0 

When we implement this problem in Matlab with ir = 
ttrs we find a matrix P m i X such that naturally A (P m ix) < 
A (Pks)- Moreover we can compare both dynamics by 
evaluating |||-Prts — -P mia; ||| compared to |||Pr:s||| which 
is approximatively equal to 111 P m ix \ \ \ • We remark that 
111 Pks ~ Pmix 111 depends on the density p of 0 in the 
matrix A. For a density equal to 0 the matrices Pks 
and P m ix are equal and the quantity 111 Pks ~ Pmix 111 will 
increase continuously when p increases. This is shown in 

(Fig# 
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FIG. 3: |||P*-s-P mte |||/|||flcs||| as a function of the density 
p of 0 present in A. 

From this, we conclude that the rules which maximize 
the KSE are close to those which minimize the mixing 
time. This becomes increasingly accurate as the fraction 
of removed links in A is weaker. Since the calculation of 
Pmix quickly becomes tedious for quite large values of m, 
we offer here a much cheaper alternative by computing 
Pks instead of P mix - 

Moreover, maximizing the KSE appears today as a 
method to describe out of equilibrium complex systems 
[8j, to find natural behaviors [4] or to define optimal dif¬ 
fusion coefficients in diffusion networks. This general ob¬ 
servation however provides a possible rationale for se¬ 
lection of stationary states in out-of-equilibrium physics: 
it seems reasonable that in a physical system with two 


simultaneous equiprobable possible dynamics, the final 
stationary state will be closer to the stationary state 
corresponding to the fastest dynamics (smallest mixing 
time). Through the link found in this letter, this state 
will correspond to a state of maximal KSE. If this is true, 
this would provide a more satisfying rule for selecting sta¬ 
tionary states in complex systems such as climate than 
the maximization of the entropy production, as already 
suggested in [5], 
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